{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "97816fd2-18e6-418f-9f9e-e5a1d5403404",
   "metadata": {},
   "source": [
    "### Descriptive Statistics:\n",
    "\n",
    "1. What is the purpose of descriptive statistics?\n",
    "2. Explain the differences between mean, median, and mode. When would you use each?\n",
    "3. What is a quartile and how does it relate to the interquartile range (IQR)?\n",
    "4. How do outliers impact measures of central tendency and dispersion?\n",
    "5. What is the significance of skewness and kurtosis in a dataset?\n",
    "6. Explain the concept of correlation and how it is measured.\n",
    "7. What is a histogram and how is it used in data analysis?\n",
    "8. Define standard deviation and variance. How are they related?\n",
    "9. Explain the difference between a population and a sample.\n",
    "10. How would you detect and handle missing values in a dataset?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d18068ee-a8e6-48c2-b46e-b3a90b86d414",
   "metadata": {},
   "source": [
    "### 1. What is the purpose of descriptive statistics?\n",
    "\n",
    "**Answer:** Descriptive statistics serve the purpose of summarizing and presenting key features of a dataset, providing a concise and meaningful overview. These statistics include measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation), offering insights into the characteristics, distribution, and variability of the data.\n",
    "\n",
    "---\n",
    "\n",
    "### 2. Explain the differences between mean, median, and mode. When would you use each?\n",
    "\n",
    "**Answer:** \n",
    "- **Mean:** The mean is the average of all values in a dataset, calculated by summing all values and dividing by the number of observations. It is suitable for symmetric distributions without extreme values.\n",
    "  \n",
    "- **Median:** The median is the middle value when the data is sorted. It is robust to outliers and best used when data is skewed or contains extreme values.\n",
    "  \n",
    "- **Mode:** The mode is the most frequently occurring value in a dataset. It is applicable to categorical data or datasets with clear peaks.\n",
    "\n",
    "---\n",
    "\n",
    "### 3. What is a quartile and how does it relate to the interquartile range (IQR)?\n",
    "\n",
    "**Answer:** \n",
    "- **Quartile:** Quartiles divide a dataset into four equal parts. The first quartile (Q1) is the 25th percentile, the second quartile (Q2) is the median, and the third quartile (Q3) is the 75th percentile.\n",
    "  \n",
    "- **Interquartile Range (IQR):** The IQR is the range between the first and third quartiles (IQR = Q3 - Q1). It represents the spread of the middle 50% of the data, offering a robust measure of variability.\n",
    "\n",
    "---\n",
    "\n",
    "### 4. How do outliers impact measures of central tendency and dispersion?\n",
    "\n",
    "**Answer:** \n",
    "- **Central Tendency:** Outliers can heavily influence the mean, pulling it towards their extreme values. The median is less affected, making it a robust measure in the presence of outliers.\n",
    "  \n",
    "- **Dispersion:** Outliers can significantly impact measures like standard deviation and variance, making them sensitive to extreme values. Robust measures like the median and IQR are less influenced by outliers.\n",
    "\n",
    "---\n",
    "\n",
    "### 5. What is the significance of skewness and kurtosis in a dataset?\n",
    "\n",
    "**Answer:** \n",
    "- **Skewness:** Skewness measures the asymmetry of the data distribution. Positive skewness indicates a tail on the right, while negative skewness indicates a tail on the left. It helps identify the direction and degree of departure from symmetry.\n",
    "  \n",
    "- **Kurtosis:** Kurtosis measures the shape of the distribution's tails. High kurtosis indicates heavy tails (leptokurtic), while low kurtosis indicates light tails (platykurtic). It provides insights into the distribution's peak and tail characteristics.\n",
    "\n",
    "---\n",
    "\n",
    "### 6. Explain the concept of correlation and how it is measured.\n",
    "\n",
    "**Answer:** \n",
    "- **Correlation:** Correlation measures the strength and direction of a linear relationship between two continuous variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no correlation.\n",
    "  \n",
    "- **Measurement:** Commonly measured using Pearson's correlation coefficient, it assesses the linear association. Spearman's rank correlation is used for ordinal data, and Kendall's tau measures the strength of monotonic relationships.\n",
    "\n",
    "---\n",
    "\n",
    "### 7. What is a histogram and how is it used in data analysis?\n",
    "\n",
    "**Answer:** \n",
    "- **Histogram:** A histogram is a graphical representation of the distribution of a continuous variable. It consists of bars that represent the frequency of data within predefined intervals (bins).\n",
    "  \n",
    "- **Usage:** Histograms visually depict the shape, central tendency, and dispersion of a dataset. They help identify patterns, trends, and potential outliers, making them a fundamental tool in exploratory data analysis.\n",
    "\n",
    "---\n",
    "\n",
    "### 8. Define standard deviation and variance. How are they related?\n",
    "\n",
    "**Answer:** \n",
    "- **Standard Deviation:** Standard deviation measures the average deviation of individual data points from the mean. It is the square root of the variance.\n",
    "  \n",
    "- **Variance:** Variance quantifies the overall dispersion or spread of a dataset. It is the average of the squared differences between each data point and the mean.\n",
    "  \n",
    "- **Relationship:** The standard deviation is the square root of the variance (Standard Deviation = √Variance).\n",
    "\n",
    "---\n",
    "\n",
    "### 9. Explain the difference between a population and a sample.\n",
    "\n",
    "**Answer:** \n",
    "- **Population:** A population is the entire set of individuals, objects, or observations about whom information is sought. It represents the complete group of interest.\n",
    "  \n",
    "- **Sample:** A sample is a subset of the population selected for analysis. It is used to make inferences about the entire population.\n",
    "\n",
    "---\n",
    "\n",
    "### 10. How would you detect and handle missing values in a dataset?\n",
    "\n",
    "**Answer:** \n",
    "- **Detection:** Missing values can be identified by examining summary statistics, visualizations, or using specific functions (e.g., isnull() in Python).\n",
    "  \n",
    "- **Handling:** Options include removing rows with missing values, imputing missing values with statistical measures (mean, median), or using advanced imputation techniques like predictive modeling.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ef70716-48a6-4484-9929-613399cc62f7",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
